language processing and text mining
Contributions to the Improvement of Question Answering Systems in the Biomedical Domain
This thesis work falls within the framework of question answering (QA) in the biomedical domain where several specific challenges are addressed, such as specialized lexicons and terminologies, the types of treated questions, and the characteristics of targeted documents. We are particularly interested in studying and improving methods that aim at finding accurate and short answers to biomedical natural language questions from a large scale of biomedical textual documents in English. QA aims at providing inquirers with direct, short and precise answers to their natural language questions. In this Ph.D. thesis, we propose four contributions to improve the performance of QA in the biomedical domain. In our first contribution, we propose a machine learning-based method for question type classification to determine the types of given questions which enable to a biomedical QA system to use the appropriate answer extraction method. We also propose an another machine learning-based method to assign one or more topics (e.g., pharmacological, test, treatment, etc.) to given questions in order to determine the semantic types of the expected answers which are very useful in generating specific answer retrieval strategies. In the second contribution, we first propose a document retrieval method to retrieve a set of relevant documents that are likely to contain the answers to biomedical questions from the MEDLINE database. We then present a passage retrieval method to retrieve a set of relevant passages to questions. In the third contribution, we propose specific answer extraction methods to generate both exact and ideal answers. Finally, in the fourth contribution, we develop a fully automated semantic biomedical QA system called SemBioNLQA which is able to deal with a variety of natural language questions and to generate appropriate answers by providing both exact and ideal answers.
Natural Language Processing and Text Mining Without Coding
Data Scientists with Natural Language Processing & Text Mining Skills are the Hottest and Most In-Demand Job Applicants Today! Data Scientist was recently dubbed "The Sexiest Job of the 21st Century" by Harvard Business Review, Glassdoor reports that Data Scientist was named the "Best Job in America for 2016," and business media from Forbes to The New York Times frequently report about the increasing demand for data scientists. Most of this boom is using data that is organized and structured from your databases and spreadsheets but a huge opportunity awaits from the untapped unstructured text data (aka tweets, Facebook posts, blog posts, comments, SMS, chats, voice transcripts, etc.). Within the data science field, natural language processing is an extremely hot area in academia, startups and is just being started to be used widely within the mainstream of corporate America. Data Scientist job posting with natural language processing skills roughly doubled in 2016.
Difference between natural language processing and text mining
Natural language processing and text mining: discover the main differences. When it comes to analyzing unstructured data sets, a range of methodologies /are used. To describe text mining, often referred to as text analytics, I like this definition from Oxford: "the process or practice of examining large collections of written resources in order to generate new information." The goal of text mining is to discover relevant information in text by transforming the text into data that can be used for further analysis. Text mining accomplishes this through the use of a variety of analysis methodologies; natural language processing (NLP) is one of them.